pong: fast analysis and visualization of latent clusters in population genetic data
نویسندگان
چکیده
MOTIVATION A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. RESULTS We introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared with other methods that process output from mixed-membership models. We apply pong to 225 705 unlinked genome-wide single-nucleotide variants from 2426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. AVAILABILITY AND IMPLEMENTATION pong is freely available and can be installed using the Python package management system pip. pong's source code is available at https://github.com/abehr/pong CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Op-cbio160329 2817..2823
Motivation: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixedmembership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the ...
متن کاملThe Major Sources of Genetic Differentiation Among Apricot Latent Virus (ApLV) Isolates
Background and Aims: Apricot latent virus (ApLV) is a species within Foveavirus genus (Betaflexiviridae family, Tymovirales order). Phylogenetic analyses using different ORFs nucleotide sequences divided most ApLV isolates into two clusters. However, there is little data about the sources of genetic differentiation among ApLV isolates. Materials and Methods: Partial coat protein (CP) sequences...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کاملVisualization of Pairwise and Multilocus Linkage Disequilibrium Structure Using Latent Forests
Linkage disequilibrium study represents a major issue in statistical genetics as it plays a fundamental role in gene mapping and helps us to learn more about human history. The linkage disequilibrium complex structure makes its exploratory data analysis essential yet challenging. Visualization methods, such as the triangular heat map implemented in Haploview, provide simple and useful tools to ...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 32 شماره
صفحات -
تاریخ انتشار 2016